Cost of Living Throughout America

Final Project
Data Science 1 with R (STAT 301-1)

Author

Chelsea Nelson

Published

December 8, 2023

Github Repo Link

Introduction

In a comprehensive exploration of the EPI Family Budget dataset, this report delves into the intricate dynamics of cost of living variations across geographical regions, shedding light on the nuanced relationships between family budgets, income disparities, and metro classifications in the United States.

I was originally motivated to perform this analysis, as I think it is interesting and beneficial to understand and see how the cost of living differs not only on the level of state vs state, but also looking further into the issue by seeing how cost of living differs by family size and county location. Additionally, I think this would be an opportunity to learn how incomes levels and expenses in each county and state differ from each other,starting to understand why these differences are present. I further think that by looking additionally at minimum wage in each state, I think that the analysis I conducted will bring attention to a lot of the inequity present in the United States of America.

In terms of initial curiosities while conducting early research on this data, I was interested in looking at how a median family income for each county then correlates to the total annual expenses on the state, regional, and metro levels. Furthermore, to then see if there are any patterns or trends in budget allocation that stand out or are different from others on multiple levels, such as the county, metro, state, and regional levels. Moving from looking at annual to monthly numbers, I was interested in seeing if there is any change between the two calculations, and if so, how that changes the overall cost of living in the different areas of interest that I have presently stated. Through having these starting curiosities, I was able to conduct a full exploration of the data that focused on comparisons and correlations to help bring insight into how different categories of expenses are valued and allocated in relation to the total expenses and cost of living in different geographical locations.

As stated above I will be using the Economic Policy Institute’s data on family budget, which also then corresponding to telling us about the cost of living in each county in America. This dataset provides insights into the average economical weights and costs of different aspects of life for each county in America both annually and monthly, whiles dividing these averages further by also looking at the different family types as well, ranging from 1 parent and no children families to 2 parent 4 children families. In order to enhance the dataset for my own research, I added information on the geographical region in which each county is located based on their state (south, midwest, northeast, and west) and added the minimum wage of each pair of state and county. The former information was sourced from the USA Census Website, and the latter was sourced from Paycom.com. See References for additional information and citation on the Economic Policy Institute’s dataset, as well as the extra information I obtain for the addition of my region and minimum wage variables.

In terms of the layout of my report, I will first discuss and provide an overview and quality check of my data, being descriptive of how the data looks and how I formatted it in the best way for my own research. I will then start my main explorations, where I will discuss on early univariate and bivariate a analyses in which I conducted, and then turn the attention towards three separate main questions that constructed and established the flow of my exploratory data analysis. Lastly, I will conclude with a summary of the main insights that I have founded throughout my research, as well as discussing potential directions that the analyses in which I conducted can be taken to the next level.

Data Overview & Quality

The FBI Family Budget dataset in its original state consisted of 27 variables and 31,430 observations. Within this, there were twenty-three numerical variables and four categorical variables. However, I did add my own minimum wage and regional variables, as well as the changed the variable type of the family type and metropolitan status variables. Additionally, I made sure that I tackled how I was going to work around the missingness that was in my dataset. Therefore, after further investigation, I realized that all of the missing values for my variables corresponded to one specific county and its multiple different family cases. Thus in this case, I decided it would be best to fully remove the observations of that particular county from my dataset, as I felt leaving it in would case more problems in terms of furthering my analysis than taking it out. Thus my updated version of the dataset includes 29 variables with 31,420 observations. Within this, there are six categorical variables and twenty-three numerical variables. Thus after my manipulation, the dataset is of high quality, being extremely well-made and will be easy to use during my data analyses as there are no underlying issues or problems.

Explorations

Welcome to the heart of my analysis – a comprehensive exploration of the EPI Family Budget dataset. In this section, we embark on a journey through the intricate layers of data, unraveling the complexities of cost-of-living variations across multiple different geographical facets in the United States.

Before diving into our main questions, let’s briefly revisit the insights gleaned from our preliminary analyses. Unraveling the individual and paired variables that provided me with a foundational understanding of the dataset’s landscape.

Additionally, I would like to preference that in this section, I have presented the most pivotal figures that encapsulate the core analyses driving my insights sand narrative. However, whiles these highlighted figures capture the essence of my findings, I acknowledge that a comprehensive view may be desired. Therefore, for the complete array of visuals generated during my explorations, including supplementary analyses and detailed breakdowns, please refer to Appendix I where a comprehensive collection of all figures not shown in this section will be displayed.

Univariate Analysis

In terms of my univariate analysis, I looked at both the categorical and numerical variables, finding the most interesting statistics and figures within my analysis of the different numerical categories of expenses.

However before looking into my categories of expenses, I believe it is important to highlight the difference in the amount of nonmetro areas to metro areas in the dataset to gauge if this geographical difference will have any impact of how I view and analyze my findings in the future.

Figure 1: Metro Status of Counties

Above in Figure 1 we see that there are a lot more instances of counties being in nonmetropolitan areas than to that of metropolitan areas. I am interested to see how this will affects aspects such as transportation and healthcare as there are heavy implications on how being further from a metro area can cause for more travel to gain necessitate items sometimes, as well as it seems that families who are further away from hospitals or don’t have such as an abundance of hospitals to them as though in extremely urban and metro areas, might go to the hospital less often. So I was really excited to look more into these relationships. Additionally, from this we can then also compare metropolitan areas in the south to that of to the north and same with nonmetropolitan areas in each region to gauge if geographical region matters more than metro status or vice versa.

Looking at the categories of expenses, I originally wanted to focus on and expand my research mostly on the total annual and monthly, transportation annual and monthly, healthcare annual and monthly, and housing annual and monthly costs. Below I have provided a brief explanation of the distribution of each expense at the national level. A breakdown of the other categories of expenses is can be founded in Appendix I - Univariate Analysis.

Figure 2: Annual Costs

In Figure 2 we see that the distribution of healthcare annual expenses has a extremely large spread in comparison to the other variables at the annual level. Within that plot, there is seems to be a symmetric mutlimodal shape with the average costs of healthcare on the annual level being around $12000. However even outside of this average value, there are still smaller significant subgroups consisting of average healthcare costs being around $6000 and $20000. Some early potential reasons that I feel cause this distribution could correlate to family size and location, as well as how the minimum wage rate and median family income relate to these higher healthcare expenses. Expanding on this we then can look at distribution of annual housing cost and we see that there is a unimodal right-skewed distribution as most families tend to spend around $12000 on housing annually. I am surprised that there isn’t a larger spread, as I know that housing in cities tend to be more expensive than housing in non-metropolitan areas, as well as different regions have different housing market demands. The distribution of transportation expenses produces a unimodal right-skewed shape as on average most families spend $13000 a year on transportation costs. I am not surprised by the lack of spread in this distribution as most families regardless of location spend a lot of many on car expenses each year, however I wan to see if metropolitan status creates any difference at all in the type of distributions presented. Lastly, in terms of the annual variables, the distribution of annual total costs spent on a nationwide level has a bimodal and slightly right-skewed shape as on average most families spend around $60000 a year. Within this plot of total annual expenses, we expect to and see that although we have our average value, there is a lot of spread and variation away from this average that we most account for, relating to state and regional differences.

Figure 3: Monthly Costs

Turning our attention the distributions of the same variables above but now at the monthly level, we see in Figure 3 as expected similar distributions trends to those in which I pointed out before. For example, looking at the distribution of healthcare costs monthly, we see lot of variability in the average expenses that healthcare is monthly, alongside a mutlimodal slightly right-skewed shape, with an average cost around $1200 a month. In terms of monthly housing expenses, the plot showcases that on average, families spend about $900 on housing, with some special cases of families spending over $2000 a month, as our distribution produces a unimodal right-skewed shape. For our transportation distribution, we see again a right-skewed unimodal shape with families on average spending $1200 each month. Lastly looking at total monthly expenses, we also see a pretty large spread in the amount that family types spend monthly at the national level, with the average being around $7000 and shape in the distribution of unimodal and right-skewed. For each of theses plots, the distributions are as expected both in comparison to the annual expenses distributions, as well as when thinking about how these and where the size of the spread for each of the distributions might occur.

Bivariate Analysis

As I made my move to conducting my bivariate analysis, I focused on curating and gaining insights through three main areas. Those being the creation of a correlation matrix, looking specifically at the relationship between median family income and total annual expenses, cost of living, at the national level, and finally bringing the univariate analyses and insights formed to more mircolevels, being at regional, family type, and metro classification levels.

First, I will be discussing the main observations that I found through my correlation matrix, which helped to contribute to the creation of my curiousities for my main explorations.

Figure 4: National-Level Correlation Matrix

Figure 4 is a correlation matrix to show the relationship between the different categories of expenses as well as how they relate to median family income and how they impact a counties in-state income ranking. Whiles looking at the in-state income ranking column, it is expected that it would have a negative correlation with the different expenses because as the expenses go up the ranking of the state goes down in term for example going from number 10 to number 1 in the ranking. However, interestingly healthcare seems to have a slightly positive but rather nonexistent relationship with the ranking, meaning that either healthcare expenses don’t play a large part in the ranking system as they are probably consistent across the board nationally or that for some reason an increase in the healthcare expenses in a county would increase its ranking in terms of for example going from 1st to 10th. This is super interesting and I would love to look more into this relationship. Looking at all of the categories of expenses, we see that they correlate strongly to each other, which makes sense due to the fact that an increase in one usually means a increase in the other as cost of living goes up. There are probably also related geographical reasons for this and perhaps some regions have stronger correlations between there different categories of expenses than others. Lastly, I would like to highlight from this plot the interesting relationship between minimum wage and healthcare (both annually and monthly) as they exhibit a slightly negative but almost nonexistent relationship. This could possibly mean that the price of healthcare is centralized around the price of the minimum wage, as I would not expect for healthcare prices to decrease as the minimum wage increases, but rather the exact opposite. I would love to look further into this relationship as well and see how they correlate and if as minimum wage increases does the allocation of one’s total expenses go or less towards healthcare.

Another main area of focus that I wanted to look at in order to help build my curiosities for my main exploration was the relationship between median family income and total annual expense or the cost of living for each county in the United States of America.

Figure 5: Median Family Income vs. Total Annual (Cost of Living)

Through Figure 5 we get to see a good visual representation of the relationship between median family income and total annual expenses at the broadest level. Within this plot, I found it intriguing but almost expected on how spread out the data was in terms of both how much families’ expenses are per year but also the disparities between income levels in the United States. Additionally, this figure brings attention to how most families’ incomes are less than the total amount of expenses that they have per year. This is already known to be a large problem in the United States as a whole, however I would like to and hope to in my further analysis be able to see if these disparities occur more in one geographical facet than the other to help bring insight into possibly why these disparities even exist.

Lastly in my bivariate analysis I wished to see how each of the different categories of expense, both on the annual and monthly levels, compared at more mircolevels, thus looking at regional, metro, and family type differences. In this section, I will only be discussing the insights that I gained on the distributions of total annual and monthly expenses at these three different levels, as it fits most into my further analyses and curiosities. However, I have also include in Appendix I - Bivariate Analysis visualizations and short descriptions of the distributions of the three other main variables in which I previously discussed (transportation, housing, healthcare) as additional information and insights.

Total Annual Expenses Based on Family Type Total Annual Expenses Based on Metro Classification and Geographical Region Total Monthly Expenses Based on Family Type

Total Monthly Expenses Based on Metro Classification and Geographical Region Observing each set of plots, we see that the only real different currently is when we look at the different family types, as the total expenses (both annual and monthly) distributions move more and more to the right, increasing. This makes senses as larger families tend to have more expenses. Otherwise on the geographical and metro status levels, we see little to no differences between total annual and monthly expenses, expect that there is a larger spread of the distribution moving more to the right, increasing, for families that live in a metro area, the Northeast, or West. This recognition is extremely important as although it is commonly known that living in one of these three geographical facets is usually more expensive, this brings attention how this then impact other factors, such as median family income and housing, while also seeing if the allocation of expenses is the same for each geographical facet as well. Thus perhaps combining these different levels will help create breakdown and showcase more meaningful insights.

Conclusion statement on both the bivariate and univariate analyses: Currently I want to look at the main factors causing for the large spread in the our distributions of healthcare both on the monthly and annual level, while also seeing where the breakdown of the total expenses on the monthly and annual levels might break down based on region or state.

Now, with this groundwork established, we can turn our attention to the core questions that I formed to drive my exploration:

1. How does the cost of living vary across different geographical facets and metro classifications?

Our first inquiry delves understanding how the geographical tapestry and metro classifications of different areas in America help to uncover the intricate variations in the cost of living. Thus through this question, I aim to unravel the economic nuances that define household budgets.

To start my investigation for this question I looked specifically at how the average cost of living varies by region and metro classification.

Figure 6: Average Cost of Living Based on Region and Metro Classification

In Figure 6 we can see that people in the metro Northeast have the highest average cost of living, whereas living in the metro South has an extremely lower average cost of living, being also similar to the # cost of living in the nonmetro Northeast. A similar relationship is seen # with the nonmetro West, which seems to be more expensive than both the metro Midwest and Metro south. Some possible reasons behind this could be related to infrastructure and accessibility in things such as public transportation. Additionally, it is usually known that housing in the South and Midwest, regardless of location tends to be less expensive than housing in the Northeast and West as in these regions there are higher housing market demands, as more and more people move to these densely populated areas. Additionally, the cost of living could also ultimately be a result of the taxes and local policies that are in these regions, as most states in the Northeast and West tend to have better healthcare and other incentives for their citizens, which is brought in via having higher taxes. It would be interesting to look at if families budgets’ are allocated to taxes in Northeast and West than the South and Midwest as I have stated above, as well as if housing is more expensive as well.

I decided to expand on this analysis looking now at income disparities and cost of living by region and metro classification, thus looking at how median family income in each combination of region and metro classification relates to the actual average total annual expenses that families have in these geographical facets.

Figure 7: Average Cost of Living Based on Region and Metro Classification

Again, in Figure 7 we see the same relationship as we previously assumed and gained insight on in Figure 7. However now interestingly we see that the people in the West have an extremely bad ratio between the amount of money they make and the actual average cost of living in the West. Looking at possible reasons related to this disparity could again be the heighten housing market demands in the West as more and more people move to these states. Additionally, the results could be picking up on a larger issue of regional economic disparities, as certain areas lesser found but a lot more affluent than others in the West causing for high levels of skewness in the figure or it could possibly be due to the cost of healthcare and other income inequalities that are present in the West. Interestingly, on the flip side we see that people in the Midwest and metro Northeast make more money than they spend on average, being the only groups in this position as everyone else makes less money than they have to spend on average. In terms of putting these insights into the real world, this could possibly be related to families in the Midwest and metro Northeast having a lot of access to really good infrastructure as well as the best access to public transport and well-established regional policies that favor higher minimum wages and other economic prosperities.

Next within this major question, I turned my attention towards looking at each geographical regions’ nonmetro and metro distributions of spending in different categories of expenses, and then also looking at the nonmetro and metro of the regions’ family types’ average distribution of spending in categories.

Figure 8: Distribution of Spending in the West Based on Metro Classification

Figure 9: Distribution of Spending in the South Based on Metro Classification

Figure 10: Distribution of Spending in the Northeast Based on Metro Classification

Figure 11: Distribution of Spending in the Midwest Based on Metro Classification

In Figure 8 through Figure 11, we are shown comparisons between non-metropolitan and metropolitan for each different geographical region in the United States at the marcolevel, thus being averages across all family types.

In the West, we see that out of total spending, housing is allocated the most expenses when living in a metro area. This result does not surprise me as the housing market demands in the West are extremely high, hosting some of the most expensive housing in America. On the other side for the West, when living in a nonmetro area, transportation is allocated the most expenses out of the total. As the West is not none for really connecting the metro areas to the nonmetro areas, I am not surprised that transportation is given the highest cost as having a car would then be an essential item in the none metro areas.

When looking at the distributions of spending in the South, we see a different trend as transportation and healthcare almost tie for being allocated the most money out of a families total annual expenses for those who live in metro areas. Again, this does not surprise me as although the Southern states do have public transportation it is not as reliable as in the Midwest and Northeast, while additionally highways are extremely important in the South and are commonly everywhere thus having a car like in the nonmetro West is an essential. In terms of healthcare, I was slightly surprised by this discovery, however after further thought we have to rather than looking at the South by itself in terms of healthcare expenses, look at it in comparison to the other regions. Thus meaning that in the South there are not as many policies and laws in place to help citizens with their medical expenses as in the other regions. For families in the nonmetro South, we see that they pretty similarly share the same allocation of expenses as to those in the metro South. The only main difference between the two groups is the price of housing. In many Southern states, land is extremely cheap and easy to find, thus living outside of a metro Southern city, is pretty affordable in terms of housing.

The distribution of spending in the Northeast, alongside spending in the Midwest, were the interesting to investigate and gain insights into because in both the metro Midwest and metro Northeast, childcare is one of the largest expenses compared to their nonmetro counterparts. This was super surprising to observe as I would not think that metro classification played a larger role in the prices of childcare. I would love to look further into this and see why is this relationship occurring. Perhaps it is due to just the idea of living in a city, things and services are more expensive or is there potential another reason behind this.

Otherwise I was not surprised by any insights shared via Figure 10 as housing was the second most expensive category, which makes sense when there are cities such as Boston and New York City, which are densely populated and have high housing market demands. For the nonmetro Northeast, transportation is allocated the most out of total annual expenses overall for families, which also didn’t surprise me as a lot of work and business are in the metro areas thus to get to them, people have to have a car or pay for public transportation regularly.

Lastly, looking more deeply at the distribution of spending in the Midwest, we see that for families that live in metro areas, overall transportation is allocated the most expenses out of the annual total expenses. I found this interesting, as I always think of Chicago, and its wonderful public transportation system so I would assume that since people don’t really need cars to get around as much in Chicago, the transportation cost would be one of the lowest, however I have to recognize that I am looking at the regional level currently, and not a specific Illinois or Cook County. For families overall that live in nonmetro areas, healthcare and transportation are the two most expensive categories. This observation made me realized that even with the Metra and Amtrak trains going through most of the Midwest, for one people who don’t live near either one of these options (which is a lot of families), they need a car to do most activities, thus increasing that transportation cost. Looking at the high expense of healthcare, I was not surprised as the Midwest overall does not seem to favor laws and policies like that in metro West and overall Northeast that provide medical benefits for citizens.

Another point related to this four plots that they all shared in common was that healthcare in the nonmetro areas was more of an expense than in metro areas. This observation could be because people in nonmetro areas tend to have larger families, thus more individuals that the family as to pay for each year or so, however I would have to also look at this on the family levels as well. Additionally, there is also more competition between insurance and medical plans in metro areas thus making companies offer lower plans to get the most sells, whereas this technique is uncommon in nonmetro areas as they have more spur populations.

After observing and gaining insights on how cost of living differs in different geographical facets at the overall level, thus not recognizing the different family types, I will now be turning towards doing so.

Figure 12: Distribution of Spending in the West Based on Metro Classification and Family Size

Figure 13: Distribution of Spending in the South Based on Metro Classification and Family Size

Figure 14: Distribution of Spending in the Northeast Based on Metro Classification and Family Size

Figure 15: Distribution of Spending in the Midwest Based on Metro Classification and Family Size

Looking at Figure 12 through Figure 15 I feel observations that really jumped out to me was that transporation is slightly allocated a little bit more of families’ total expenses in the nonmetro areas than that of in the metro areas. This makes sense as there are buses and other forms of public transportation that are accessible in metro areas that aren’t in nonmetro areas, while also knowing that people regardless of location, if they can, love the ability to have a car and drive. In all other aspects regardless of family size, people in metro areas seem to spend more money than people in nonmetro areas, which makes sense in terms of metro areas being relatively more expense than nonmetro areas for the most part in America. I was especially suprised by the closeness in the expenses for transportation for metro and nonmetro areas. Especially for the metro Northeast and metro Midwest, as I feel they are known for having the best public transportation in the United States, thus thinking that families that live outside of those metro areas would pay a lot more as they have to have a car and pay for those associated expenses. However after further investigate and understanding I find it interesting that the largest gap between transportation for the metro and nonmetro areas is when the families consist of no children, thus bringing light to how having children more often than not means that the parent is going to have to get a car, decreasing that space between the different allocation of transportation expenses between metro and nonmetro areas.

Another interestingly highlight that I spoke on before shortly is that consistently healthcare seems to be the only areas where on average families who live in nonmetro areas seem to allocate more of their total expense to it than families in metro areas. Knowing that this trend exists across the border in the United States at least at the regional level, bring attention to perhaps the finance and economical, and political disparities and differences between living in the two areas. However, as expected in every other category, regardless of family size, families who live in metro areas on average pay more than that of families in nonmetro areas. This is a reasonable assumption because metro areas often have higher cost of living due to increased demands for housing, services, and amenities, as well as there is limited space in urban environment usually compared to nonmetro areas. Additionally, most metro areas have higher taxes. Thus, I feel that to obtain more insights on hidden differences, I would have to go towards the state and county levels.

Lastly in terms of my exploration for this question I turned towards looking at comparing median income to total annual expenses for each state divided at the region level, being able to compare each state to other states in its region to see if they actually share similar properties at this marcolevel. Thus seeing how the state separately compared to our results found in Figure 7.

Figure 16: Median Income to Cost of Living for Western States

Figure 17: Median Income to Cost of Living for Southern States

Figure 18: Median Income to Cost of Living for Northeastern States

Figure 19: Median Income to Cost of Living for Midwestern States

Looking at the Western states in Figure 16, we see that obviously as the amount of people the family increases, the average cost of living increases as well. However, this is not turn for the median family income, that is a overarching variable in my dataset. Thus for Western states, we can conclude that income disparities start to appear when families have 4 or more people, as the median family income becomes lesser than the average cost of living. Another interesting observation from this data set is related to the cost of living in Hawaii being so much more than that of in the other states in the West, thus perhaps this play a matter in skewing some of the allocation of expenses when we were looking at those figures before.

In Figure 17, again we see that income disparities start to appear around when families have 4 or more people, as the median family income becomes lesser than the average cost of living. Again there seems to lies another outlier state, with this time it being DC.

Looking at the Northeastern states in Figure 18, we see that unlike for Western and Southern states, from this level income disparities only seem to concern when families have 5 or more people in them, with the median family income becoming lesser than the average cost of living. Interestingly, New Hampshire acts as a middle man between a cluster of states that exhibit lower median family incomes and another cluster that exhibits extremely high median family incomes. I woud love to see what makes New Hamsphire state out in this region of states, while also looking to see why these distinctions and separations are occuring.

Lastly in Figure 19, interestingly income disparities start to appear around when families have 3 or more people, as the median family income becomes lesser than the average cost of living. Thus occuring earlier than any of the other regions at this state level view. Additionally, unlike for Western and Southern states, there does not seem to be an outlier in the Midwestern states, all consisting our the same level of average cost of living, however with different median family incomes.

Thus through this exploration, I was able to gain insights into how cost of living differs across geographical, metro classification, and even state lines. With the most interesting insights for me pertaining to how family size impacts the allocation of expenses, while also revealing intriguing patterns in transportation and healthcare expenditures. The variations in income disparities across different regions and metro classifications shed light on the complex economic landscape, showcasing where families struggle and thrive financially. Further investigations into the outliers and clusters within states contribute would help to gain a deeper understanding of the nuanced factors influencing the cost of living. As we transition to the next question, these insights pave the way for uncovering more intricate details about budget allocations, patterns, and economic disparities.

Conclusions

State conclusions or insights. Were you surprised by things you found or were they as expected? Why? This is a great place for future work, new research questions, and next steps.

References

  1. Economic Policy Institute (2022, March) Family Budget Map. https://www.epi.org/resources/budget/budget-map/

  2. U.S. Census Bureau (2021, October 8). Census Regions and Divisions of the United States. https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf

  3. Paycom (2023, October 2). Your 2023 Guide to Every State’s Minimum Wage. https://www.paycom.com/resources/blog/minimum-wage-rate-by-state/

Appendix I: Extra Explorations

Univariate Analysis

Figure 32: Annual Costs - Other Categories of Expenses

In Figure 32 we see that the distribution of food annual expenses has a interestingly large spread, as it has a somewhat multimodal right-skewed shape with the average costs of food on the annual level being around $9000. However even outside of this average value, there are some families that spend as little as around $3000 on food a year or as much as almost $20000 on food a year. This range is probably most related to the variations in family sizes. Expanding on this we then can look at distribution of annual taxes cost and we see that there is a unimodal right-skewed distribution as most families tend to have around $5000 in taxes expenses annually. I am surprised that there isn’t a larger spread, as I know that housing in cities tend to be more expensive than living in a non-metropolitan areas, as well as different regions have different tax policies. The distribution of childcare expenses produces almost two different plots, one being a singular bar representing the families without children and the second being that of families with children. Turning our attention more to the side with families that have children ther eseems to be a multimodal symmetric shape in the distribution with most families spending our $12000 on childcare annually. Similar to food annual expenses, the distribution of this expense is mostly related to how many children are in the family. Lastly, the distribution of other necessities annual costs spent on a nationwide level has a bimodal and slightly right-skewed shape as on average most families spend around $6000 a year on other necessities not mentioned in the categories above.

Figure 33: Monthly Costs - Other Categories of Expenses

Turning our attention the distributions of the same variables above but now at the monthly level, we see in Figure 33 as expected similar distributions trends to those in which I pointed out before. For example, the distribution of food now on the monthly level still seems to have a relatively large distribution, as it exhibits a multimodal slightly right-skewed shape. If I had time, I would have loved to see how food stamps and other economic factors play into this category of expense. The distribution of taxes monthly has a unimodal right-skewed shape as most families get taxed around $500 a month. However again, there is a relatively large spread thus bringing awareness to the different taxes policies that are in place around the United States. Again, looking at childcare now at the monthly level, we can divide this figure into almost two different plots, one for families with children and one for families without children. Looking at the side of families with children, we can see that thre is a multimodal slightly right-skewed shape as most families spend around $1000 a month on childcare expenses, however again this price is extremely spread out based on the family size. Finally, looking at the distribution of other necessities expenses for families at the monthly rate, we see that there is similar to the annual rate, a bimodal slightly right-skewed shape to indicate that different families truly do have different allocations for this other necessities category of expenses, I wonder if this is regional, state by state, county by county, or how this allocation is decided. For each of theses plots, the distributions are as expected in comparison to the annual expenses distributions.

Bivariate Analysis

For all of the figures below, just as in my bivariate analysis we see that the only real different currently is when we look at the different family types, with the categories expenses (both annual and monthly) distributions move more and more to the right, increasing. This makes senses as larger families tend to have more expenses in all of the different categories. Otherwise on the geographical and metro status levels, we see little to no differences between the categories annual and monthly expenses, expect that there is a larger spread of the distribution moving more to the right, increasing, for families that live in a metro area, the Northeast, or West.

Annual Expenses

Figure 34: Healthcare Annual Expenses Based on Family Type

Figure 35: Healthcare Annual Expenses Based on Metro Classification and Geographical Region

Figure 36: Transportation Annual Expenses Based on Family Type

Figure 37: Healthcare Annual Expenses Based on Metro Classification and Geographical Region

Figure 38: Housing Annual Expenses Based on Family Type

Figure 39: Housing Annual Expenses Based on Metro Classification and Geographical Region

Monthly Expenses

Figure 40: Healthcare Monthly Expenses Based on Family Type

Figure 41: Healthcare Monthly Expenses Based on Metro Classification and Geographical Region

Figure 42: Transportation Monthly Expenses Based on Family Type

Figure 43: Transportation Monthly Expenses Based on Metro Classification and Geographical Region

Figure 44: Housing Monthly Expenses Based on Family Type

Figure 45: Housing Monthly Expenses Based on Metro Classification and Geographical Region